Summarizing and understanding large graphs
نویسندگان
چکیده
How can we succinctly describe a million-node graph with a few simple sentences? Given a large graph, how can we find its most ‘important’ structures, so that we can summarize it and easily visualize it? How can we measure the ‘importance’ of a set of discovered subgraphs in a large graph? Starting with the observation that real graphs often consist of stars, bipartite cores, cliques and chains, our main idea is to find the most succinct description of a graph in these ‘vocabulary’ terms. To this end, we first mine candidate subgraphs using one or more graph partitioning algorithms. Next, we identify the optimal summarization using the Minimum Description Length (MDL) principle, picking only those subgraphs from the candidates that together yield the best lossless compression of the graph—or, equivalently, that most succinctly describe its adjacency matrix. Our contributions are three-fold: (a) formulation: we provide a principled encoding scheme to identify the vocabulary type of a given subgraph for six structure types prevalent in real-world graphs, (b) algorithm: we develop VOG, an efficient method to approximate the MDL-optimal summary of a given graph in terms of local graph structures, and (c) applicability: we report an extensive empirical evaluation on multi-million-edge real graphs, including Flickr and the Notre Dame web graph.
منابع مشابه
Summarizing Static and Dynamic Big Graphs
Large-scale, highly-interconnected networks pervade our society and the natural world around us, including the World Wide Web, social networks, knowledge graphs, genome and scientific databases, medical and government records. The massive scale of graph data often surpasses the available computation and storage resources. Besides, users get overwhelmed by the daunting task of understanding and ...
متن کاملDiscovery of Rare Sequential Topic Patterns in Document Stream
When and Where: Predicting Human Movements Based on Social Spatial-Temporal Events Ning Yang*, Sichuan University; Xiangnan Kong, University of Illinois at Chicago; Fengjiao Wang, University of Illinois at Chicago; Philip Yu, University of Active Multitask Learning Using Both Latent and Supervised Shared Topics Ayan Acharya*, University of Texas at Austin; Raymond Mooney, University of Texas at...
متن کاملSummarizing Answer Graphs Induced by Keyword Queries
Keyword search has been popularly used to query graph data. Due to the lack of structure support, a keyword query might generate an excessive number of matches, referred to as “answer graphs”, that could include different relationships among keywords. An ignored yet important task is to group and summarize answer graphs that share similar structures and contents for better query interpretation ...
متن کاملOn Summarizing Large-Scale Dynamic Graphs
How can we describe a large, dynamic graph over time? Is it random? If not, what are the most apparent deviations from randomness – a dense block of actors that persists over time, or perhaps a star with many satellite nodes that appears with some fixed periodicity? In practice, these deviations indicate patterns – for example, research collaborations forming and fading away over the years. Whi...
متن کاملVOG: Summarizing and Understanding Large Graphs
How can we succinctly describe a million-node graph with a few simple sentences? How can we measure the ‘importance’ of a set of discovered subgraphs in a large graph? These are exactly the problems we focus on. Our main ideas are to construct a ‘vocabulary’ of subgraph-types that often occur in real graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the most succinct desc...
متن کاملGenerating examples of paths summarizing RDF datasets
As datasets become too large to be comprehended directly, a need for data summarization arises. A data summary can present typical patterns commonly found in a dataset, from which high-level understanding of the data can be obtained. Nonetheless, such abstract understanding can be improved by providing concrete examples of the summary patterns. If possible, the chosen examples should be diverse...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Statistical Analysis and Data Mining
دوره 8 شماره
صفحات -
تاریخ انتشار 2015